#ForeCast report
Team:CaiXiaoYi DingAoDi DuNaiHe LiaoNing WangLin
The analysis of temporal data is an important issue in current research, because most real-world data either explicitly or implicitly contain some information about time.
In the current forecast data, there are three sheets, namely mass, food and drug.
in the treatment of data.
We have not predicted them together.because they have some differences in certain realistic factors.
The context of this data is the United States, so I learned about the differences between American consumption habits and mass, food and drug.
For example, FOOD sells better because in the US, people like to go to FOOD after work and school to buy fresh pies, cakes and other fast food. FMCG products need to be bought fresh and not stocked up on too much food.
And mass, although the supermarket is the largest, it doesn't sell more than food, this is because mass is suitable for family level to go shopping, it may be a wholesale market, it may have a scale cost advantage compared to DRUG, so it sells in the mid range and people like weekends. The whole family drives to buy food for a week or so and fresh pie has a limited expiry date so it doesn't people like to drive to buy food for a week or so on the weekend and raw pie has a limited expiry date so they don't buy too much raw pie at a time. And DRUG, like the small supermarkets near home, has a single flavour of pie, so it has the lowest sales.
Therefore, we did not combine the data from these three size classes of supermarkets in this forecast.
But this forecast has certain limitations**, because the yearly data is particularly small and it has missing months, for example in 04 09 it only has half yearly data. Here, in the traditional model (time regression), we only forecast the annual total by monthly averages and then use these five totals to forecast the annual trend. This is because it is inherently difficult to predict three points from five points, as only the trend is predicted in the annual data.
In terms of model use
For this prediction task, our team, not only used the traditional models, stl, ETS,HW, time regression, but also some novel models, such as auto_arima_xgboost, randomForest,earth, prophet_xgboost, stlm_ets, stlm_ arima,prophet,And so on, a dozen models. some of them are traditional models combined with machine learning algorithms to good effect, and these are the models that people like to use in the kaggle competition.
The report is exported in HTML format because this way plotly can be used and the results can be viewed interactively.
# author:Cai
# rm(list = ls())
get_data_by_no <- function(df, cnum){
first_cnum = 1 + cnum
second_cnum = 73 + cnum
res <- data.frame(df[1], df[first_cnum], df[second_cnum])
#test
res <- res[2:nrow(res),]
names(res) <- c("Month", "VOLUMN", "PRICE")
return (res)
}
mass <- read.xlsx("/Users/wangzuxian/Data_for_test1.xlsx",sheet = 1)
food <- read.xlsx("/Users/wangzuxian/Data_for_test1.xlsx",sheet = 2)
drug <- read.xlsx("/Users/wangzuxian/Data_for_test1.xlsx",sheet = 3)
cnum = 7
mass <- get_data_by_no(mass, cnum)
food <- get_data_by_no(food, cnum)
drug <- get_data_by_no(drug, cnum)
all <- rbind(mass, food)
mass
## Month VOLUMN PRICE
## 2 2004_Jul 91942.571429999996 0.54006200000000004
## 3 2004_Aug 80327.285709999996 0.64228099999999999
## 4 2004_Sep 67387.857139999993 0.98359300000000005
## 5 2004_Oct 81307.071429999996 1.3260529999999999
## 6 2004_Nov 172584.28570000001 2.1541800000000002
## 7 2004_Dec 116803.64290000001 1.8618030000000001
## 8 2005_Jan 67193.464290000004 1.111264
## 9 2005_Feb 65111.321430000004 1.109623
## 10 2005_Mar 81389.357139999993 1.255217
## 11 2005_Apr 90105.428570000004 0.96763299999999997
## 12 2005_May 136057.89290000001 0.78552500000000003
## 13 2005_Jun 107802.75 0.90408299999999997
## 14 2005_Jul 114619.78569999999 0.85580400000000001
## 15 2005_Aug 152440.71429999999 0.79672600000000005
## 16 2005_Sep 168344.78570000001 0.75808799999999998
## 17 2005_Oct 127759.21430000001 0.99165000000000003
## 18 2005_Nov 143462.07139999999 2.0077929999999999
## 19 2005_Dec 122154.53569999999 1.797272
## 20 2006_Jan 88740.25 1.008996
## 21 2006_Feb 87689.428570000004 1.055606
## 22 2006_Mar 123369.75 0.99135399999999996
## 23 2006_Apr 155312.03570000001 0.85621400000000003
## 24 2006_May 140064.92860000001 0.85486399999999996
## 25 2006_Jun 135176.78570000001 0.95822399999999996
## 26 2006_Jul 178455.57139999999 0.72394099999999995
## 27 2006_Aug 216618.21429999999 0.77141300000000002
## 28 2006_Sep 198734.92860000001 0.83291000000000004
## 29 2006_Oct 215120.07139999999 0.95692200000000005
## 30 2006_Nov 261636.85709999999 1.9120539999999999
## 31 2006_Dec 214985.07139999999 1.834578
## 32 2007_Jan 149417.78570000001 0.98325700000000005
## 33 2007_Feb 141932.71429999999 0.89841199999999999
## 34 2007_Mar 174709.42860000001 0.87739900000000004
## 35 2007_Apr 169455 0.94608199999999998
## 36 2007_May 168988.82139999999 0.94844300000000004
## 37 2007_Jun 162072.32139999999 1.191935
## 38 2007_Jul 137077.07139999999 1.318379
## 39 2007_Aug 167178.39290000001 1.0474460000000001
## 40 2007_Sep 181669.89290000001 1.0151950000000001
## 41 2007_Oct 165415.42860000001 1.3144169999999999
## 42 2007_Nov 173307.92860000001 2.3762050000000001
## 43 2007_Dec 126072.78569999999 2.1934399999999998
## 44 2008_Jan 105059.14290000001 1.4437770000000001
## 45 2008_Feb 149494.57139999999 1.0774969999999999
## 46 2008_Mar 103524.5714 1.7223170000000001
## 47 2008_Apr 174153.64290000001 1.1047309999999999
## 48 2008_May 197059 1.033892
## 49 2008_Jun 130458.9286 1.4606319999999999
## 50 2008_Jul 133018.07139999999 1.3096939999999999
## 51 2008_Aug 146845.21429999999 1.248834
## 52 2008_Sep 95496.964290000004 1.5976870000000001
## 53 2008_Oct 78102.107139999993 1.838252
## 54 2008_Nov 215058.28570000001 1.9767680000000001
## 55 2008_Dec 145347.67860000001 1.793644
## 56 2009_Jan 79226.178570000004 1.6557230000000001
## 57 2009_Feb 53561.14286 1.7516210000000001
## 58 2009_Mar 68385.392860000007 1.8233410000000001
## 59 2009_Apr 87508.821429999996 1.69678
## 60 2009_May 87781.785709999996 1.6274189999999999
## 61 2009_Jun 107918.5714 1.477811
food
## Month VOLUMN PRICE
## 2 2004_Jul 14580047.289999999 1.4920530000000001
## 3 2004_Aug 14809137.640000001 1.4252750000000001
## 4 2004_Sep 15864933.93 1.416803
## 5 2004_Oct 17125285.890000001 1.5655950000000001
## 6 2004_Nov 20660632.68 2.4059710000000001
## 7 2004_Dec 15922815.859999999 2.3659119999999998
## 8 2005_Jan 12963245.289999999 1.4825759999999999
## 9 2005_Feb 12795191 1.3988100000000001
## 10 2005_Mar 14373304.289999999 1.4715279999999999
## 11 2005_Apr 13649767.43 1.39777
## 12 2005_May 13877404.890000001 1.423743
## 13 2005_Jun 13311289.32 1.508758
## 14 2005_Jul 13453650.640000001 1.475236
## 15 2005_Aug 14053982.539999999 1.383799
## 16 2005_Sep 15920217.109999999 1.3034220000000001
## 17 2005_Oct 16409994.359999999 1.581861
## 18 2005_Nov 21044427.93 2.4530940000000001
## 19 2005_Dec 16387731.859999999 2.3168510000000002
## 20 2006_Jan 13403199.93 1.4844440000000001
## 21 2006_Feb 13562727.57 1.3823460000000001
## 22 2006_Mar 15348003.859999999 1.402471
## 23 2006_Apr 14298922.93 1.5179940000000001
## 24 2006_May 13635635.359999999 1.5351300000000001
## 25 2006_Jun 13384881.710000001 1.626436
## 26 2006_Jul 13270425.960000001 1.632979
## 27 2006_Aug 13210034.32 1.661734
## 28 2006_Sep 13710296.43 1.6706650000000001
## 29 2006_Oct 14707820.039999999 1.877999
## 30 2006_Nov 19268178.75 2.801139
## 31 2006_Dec 14473257.289999999 2.5505209999999998
## 32 2007_Jan 11343783.25 1.807471
## 33 2007_Feb 11147825.57 1.6887110000000001
## 34 2007_Mar 12386796.390000001 1.7166399999999999
## 35 2007_Apr 11951512.859999999 1.7857419999999999
## 36 2007_May 12634817.93 1.6813020000000001
## 37 2007_Jun 12676693.93 1.738186
## 38 2007_Jul 12036229.210000001 1.7763720000000001
## 39 2007_Aug 12535093.57 1.7050259999999999
## 40 2007_Sep 13214116.57 1.6688559999999999
## 41 2007_Oct 14616719.57 1.8079209999999999
## 42 2007_Nov 18908816.359999999 2.8379569999999998
## 43 2007_Dec 13255560.32 2.5543979999999999
## 44 2008_Jan 11116518.890000001 1.852541
## 45 2008_Feb 10703282.289999999 1.8026260000000001
## 46 2008_Mar 11773156.710000001 1.8430930000000001
## 47 2008_Apr 11475395.789999999 1.776068
## 48 2008_May 11807544.039999999 1.835863
## 49 2008_Jun 11501570.609999999 1.928809
## 50 2008_Jul 11730316.789999999 1.8281970000000001
## 51 2008_Aug 11499208.710000001 1.869837
## 52 2008_Sep 11687784.82 1.8807499999999999
## 53 2008_Oct 12939811.039999999 2.137677
## 54 2008_Nov 17506532.710000001 3.144949
## 55 2008_Dec 12890285.640000001 2.8854950000000001
## 56 2009_Jan 10582184.210000001 2.1201300000000001
## 57 2009_Feb 10634505.289999999 1.839275
## 58 2009_Mar 13037013.57 1.8388500000000001
## 59 2009_Apr 12569873.039999999 1.8409899999999999
## 60 2009_May 12705934.109999999 1.802271
## 61 2009_Jun 12323964.640000001 1.9154990000000001
drug
## Month VOLUMN PRICE
## 2 2004_Jul 342038.89289999998 0.91586999999999996
## 3 2004_Aug 353867.96429999999 0.87870499999999996
## 4 2004_Sep 353385 0.89099899999999999
## 5 2004_Oct 382660.53570000001 0.91813599999999995
## 6 2004_Nov 396199.82140000002 1.1610769999999999
## 7 2004_Dec 374950.21429999999 1.0878859999999999
## 8 2005_Jan 373119.71429999999 0.90710500000000005
## 9 2005_Feb 363302.75 0.95036500000000002
## 10 2005_Mar 373210.10710000002 0.91667500000000002
## 11 2005_Apr 356498.57140000002 0.89837999999999996
## 12 2005_May 373107.92859999998 0.88058899999999996
## 13 2005_Jun 316255.67859999998 0.88217500000000004
## 14 2005_Jul 309310.39289999998 0.90456899999999996
## 15 2005_Aug 318453.25 0.91396599999999995
## 16 2005_Sep 322646.35710000002 0.887374
## 17 2005_Oct 362755.67859999998 0.96897800000000001
## 18 2005_Nov 376950.71429999999 1.1442209999999999
## 19 2005_Dec 351865.07140000002 1.0549839999999999
## 20 2006_Jan 338764.78570000001 0.98413700000000004
## 21 2006_Feb 325562.71429999999 0.96184199999999997
## 22 2006_Mar 366729.03570000001 0.92859899999999995
## 23 2006_Apr 353113.17859999998 0.90074399999999999
## 24 2006_May 355834.21429999999 0.91462399999999999
## 25 2006_Jun 328466.85710000002 0.91182300000000005
## 26 2006_Jul 317527 0.95529200000000003
## 27 2006_Aug 330962.42859999998 0.96611800000000003
## 28 2006_Sep 345281.64289999998 0.96225899999999998
## 29 2006_Oct 344282.39289999998 1.0801959999999999
## 30 2006_Nov 339512.89289999998 1.278718
## 31 2006_Dec 315622.14289999998 1.1533739999999999
## 32 2007_Jan 303314.64289999998 1.0544720000000001
## 33 2007_Feb 307557.35710000002 1.0378909999999999
## 34 2007_Mar 343296.21429999999 1.02915
## 35 2007_Apr 337749.64289999998 1.0184660000000001
## 36 2007_May 349303.28570000001 1.0323960000000001
## 37 2007_Jun 319853.03570000001 1.039404
## 38 2007_Jul 308596.25 1.0461009999999999
## 39 2007_Aug 312512.32140000002 1.0452030000000001
## 40 2007_Sep 336597 1.035852
## 41 2007_Oct 364338.75 1.0903080000000001
## 42 2007_Nov 362315 1.28379
## 43 2007_Dec 342889.25 1.1375189999999999
## 44 2008_Jan 339879.89289999998 1.1090450000000001
## 45 2008_Feb 355203.60710000002 1.048227
## 46 2008_Mar 406840.39289999998 0.99135200000000001
## 47 2008_Apr 332608.28570000001 1.087043
## 48 2008_May 350392.21429999999 1.101599
## 49 2008_Jun 314237.82140000002 1.086986
## 50 2008_Jul 310671.96429999999 1.069949
## 51 2008_Aug 301160.75 1.1224879999999999
## 52 2008_Sep 320567.5 1.1311899999999999
## 53 2008_Oct 324321.32140000002 1.210744
## 54 2008_Nov 328502.64289999998 1.5387139999999999
## 55 2008_Dec 321003.64289999998 1.317642
## 56 2009_Jan 307358.85710000002 1.2528520000000001
## 57 2009_Feb 288944.57140000002 1.2352019999999999
## 58 2009_Mar 325536.89289999998 1.226621
## 59 2009_Apr 316638.39289999998 1.16137
## 60 2009_May 299196.14289999998 1.1962550000000001
## 61 2009_Jun 274674.64289999998 1.2098329999999999
all
## Month VOLUMN PRICE
## 2 2004_Jul 91942.571429999996 0.54006200000000004
## 3 2004_Aug 80327.285709999996 0.64228099999999999
## 4 2004_Sep 67387.857139999993 0.98359300000000005
## 5 2004_Oct 81307.071429999996 1.3260529999999999
## 6 2004_Nov 172584.28570000001 2.1541800000000002
## 7 2004_Dec 116803.64290000001 1.8618030000000001
## 8 2005_Jan 67193.464290000004 1.111264
## 9 2005_Feb 65111.321430000004 1.109623
## 10 2005_Mar 81389.357139999993 1.255217
## 11 2005_Apr 90105.428570000004 0.96763299999999997
## 12 2005_May 136057.89290000001 0.78552500000000003
## 13 2005_Jun 107802.75 0.90408299999999997
## 14 2005_Jul 114619.78569999999 0.85580400000000001
## 15 2005_Aug 152440.71429999999 0.79672600000000005
## 16 2005_Sep 168344.78570000001 0.75808799999999998
## 17 2005_Oct 127759.21430000001 0.99165000000000003
## 18 2005_Nov 143462.07139999999 2.0077929999999999
## 19 2005_Dec 122154.53569999999 1.797272
## 20 2006_Jan 88740.25 1.008996
## 21 2006_Feb 87689.428570000004 1.055606
## 22 2006_Mar 123369.75 0.99135399999999996
## 23 2006_Apr 155312.03570000001 0.85621400000000003
## 24 2006_May 140064.92860000001 0.85486399999999996
## 25 2006_Jun 135176.78570000001 0.95822399999999996
## 26 2006_Jul 178455.57139999999 0.72394099999999995
## 27 2006_Aug 216618.21429999999 0.77141300000000002
## 28 2006_Sep 198734.92860000001 0.83291000000000004
## 29 2006_Oct 215120.07139999999 0.95692200000000005
## 30 2006_Nov 261636.85709999999 1.9120539999999999
## 31 2006_Dec 214985.07139999999 1.834578
## 32 2007_Jan 149417.78570000001 0.98325700000000005
## 33 2007_Feb 141932.71429999999 0.89841199999999999
## 34 2007_Mar 174709.42860000001 0.87739900000000004
## 35 2007_Apr 169455 0.94608199999999998
## 36 2007_May 168988.82139999999 0.94844300000000004
## 37 2007_Jun 162072.32139999999 1.191935
## 38 2007_Jul 137077.07139999999 1.318379
## 39 2007_Aug 167178.39290000001 1.0474460000000001
## 40 2007_Sep 181669.89290000001 1.0151950000000001
## 41 2007_Oct 165415.42860000001 1.3144169999999999
## 42 2007_Nov 173307.92860000001 2.3762050000000001
## 43 2007_Dec 126072.78569999999 2.1934399999999998
## 44 2008_Jan 105059.14290000001 1.4437770000000001
## 45 2008_Feb 149494.57139999999 1.0774969999999999
## 46 2008_Mar 103524.5714 1.7223170000000001
## 47 2008_Apr 174153.64290000001 1.1047309999999999
## 48 2008_May 197059 1.033892
## 49 2008_Jun 130458.9286 1.4606319999999999
## 50 2008_Jul 133018.07139999999 1.3096939999999999
## 51 2008_Aug 146845.21429999999 1.248834
## 52 2008_Sep 95496.964290000004 1.5976870000000001
## 53 2008_Oct 78102.107139999993 1.838252
## 54 2008_Nov 215058.28570000001 1.9767680000000001
## 55 2008_Dec 145347.67860000001 1.793644
## 56 2009_Jan 79226.178570000004 1.6557230000000001
## 57 2009_Feb 53561.14286 1.7516210000000001
## 58 2009_Mar 68385.392860000007 1.8233410000000001
## 59 2009_Apr 87508.821429999996 1.69678
## 60 2009_May 87781.785709999996 1.6274189999999999
## 61 2009_Jun 107918.5714 1.477811
## 210 2004_Jul 14580047.289999999 1.4920530000000001
## 310 2004_Aug 14809137.640000001 1.4252750000000001
## 410 2004_Sep 15864933.93 1.416803
## 510 2004_Oct 17125285.890000001 1.5655950000000001
## 62 2004_Nov 20660632.68 2.4059710000000001
## 71 2004_Dec 15922815.859999999 2.3659119999999998
## 81 2005_Jan 12963245.289999999 1.4825759999999999
## 91 2005_Feb 12795191 1.3988100000000001
## 101 2005_Mar 14373304.289999999 1.4715279999999999
## 111 2005_Apr 13649767.43 1.39777
## 121 2005_May 13877404.890000001 1.423743
## 131 2005_Jun 13311289.32 1.508758
## 141 2005_Jul 13453650.640000001 1.475236
## 151 2005_Aug 14053982.539999999 1.383799
## 161 2005_Sep 15920217.109999999 1.3034220000000001
## 171 2005_Oct 16409994.359999999 1.581861
## 181 2005_Nov 21044427.93 2.4530940000000001
## 191 2005_Dec 16387731.859999999 2.3168510000000002
## 201 2006_Jan 13403199.93 1.4844440000000001
## 211 2006_Feb 13562727.57 1.3823460000000001
## 221 2006_Mar 15348003.859999999 1.402471
## 231 2006_Apr 14298922.93 1.5179940000000001
## 241 2006_May 13635635.359999999 1.5351300000000001
## 251 2006_Jun 13384881.710000001 1.626436
## 261 2006_Jul 13270425.960000001 1.632979
## 271 2006_Aug 13210034.32 1.661734
## 281 2006_Sep 13710296.43 1.6706650000000001
## 291 2006_Oct 14707820.039999999 1.877999
## 301 2006_Nov 19268178.75 2.801139
## 311 2006_Dec 14473257.289999999 2.5505209999999998
## 321 2007_Jan 11343783.25 1.807471
## 331 2007_Feb 11147825.57 1.6887110000000001
## 341 2007_Mar 12386796.390000001 1.7166399999999999
## 351 2007_Apr 11951512.859999999 1.7857419999999999
## 361 2007_May 12634817.93 1.6813020000000001
## 371 2007_Jun 12676693.93 1.738186
## 381 2007_Jul 12036229.210000001 1.7763720000000001
## 391 2007_Aug 12535093.57 1.7050259999999999
## 401 2007_Sep 13214116.57 1.6688559999999999
## 411 2007_Oct 14616719.57 1.8079209999999999
## 421 2007_Nov 18908816.359999999 2.8379569999999998
## 431 2007_Dec 13255560.32 2.5543979999999999
## 441 2008_Jan 11116518.890000001 1.852541
## 451 2008_Feb 10703282.289999999 1.8026260000000001
## 461 2008_Mar 11773156.710000001 1.8430930000000001
## 471 2008_Apr 11475395.789999999 1.776068
## 481 2008_May 11807544.039999999 1.835863
## 491 2008_Jun 11501570.609999999 1.928809
## 501 2008_Jul 11730316.789999999 1.8281970000000001
## 511 2008_Aug 11499208.710000001 1.869837
## 521 2008_Sep 11687784.82 1.8807499999999999
## 531 2008_Oct 12939811.039999999 2.137677
## 541 2008_Nov 17506532.710000001 3.144949
## 551 2008_Dec 12890285.640000001 2.8854950000000001
## 561 2009_Jan 10582184.210000001 2.1201300000000001
## 571 2009_Feb 10634505.289999999 1.839275
## 581 2009_Mar 13037013.57 1.8388500000000001
## 591 2009_Apr 12569873.039999999 1.8409899999999999
## 601 2009_May 12705934.109999999 1.802271
## 611 2009_Jun 12323964.640000001 1.9154990000000001
#author :CaiVOL_CORN CHIPS
mass$VOLUMN <- as.numeric(mass$VOLUMN)
mass$PRICE <- as.numeric(mass$PRICE)
food$VOLUMN <- as.numeric(food$VOLUMN)
food$PRICE <- as.numeric(food$PRICE)
drug$VOLUMN <- as.numeric(drug$VOLUMN)
drug$PRICE <- as.numeric(drug$PRICE)
all$VOLUMN <- as.numeric(all$VOLUMN)
all$PRICE <- as.numeric(all$PRICE)
mass$weight_sum = as.numeric(mass$VOLUMN) * as.numeric(mass$PRICE)
food$weight_sum = as.numeric(food$VOLUMN)*as.numeric(food$PRICE)
drug$weight_sum = as.numeric(drug$VOLUMN)*as.numeric(drug$PRICE)
all$weight_sum = as.numeric(all$VOLUMN)*as.numeric(all$PRICE)
a.quarter b.year d.for price (weighted mean) 2.Calculate turnover (price x volume)
#author :Cai
quarter_data = function(df){
Sys.setlocale('LC_TIME', 'C')
month <- df$Month
month <- str_c('1_',month)
month <- as.Date(month,format='%d_%Y_%b')
quarter <- str_c(year(month),'-',quarters(month))
df$quarter <- quarter
# weight_sum is Turnover
# data month
df_month <- data.frame(df$Month,df$VOLUMN,df$VOLUMN,df$weight_sum)
# Quarterly data
vol_sum <- aggregate(df$VOLUMN, by=list(type=df$quarter),sum)
vol_sum
weight_sum <- aggregate(df$weight_sum, by=list(type=df$quarter),sum)
df_quarter = data.frame(quarter=vol_sum$type,
vol_sum=vol_sum$x,
weight_sum=weight_sum$x)
df_quarter$weight_mean <- df_quarter$weight_sum/df_quarter$vol_sum
df_quarter
return (df_quarter)
}
year_data = function(df){
year <- substring(df$Month, 1, 4)
df$year <- year
df_year <- df %>%
group_by(year)%>%
summarise(weight_sum = sum(weight_sum))
vol_sumy <- aggregate(df$VOLUMN, by=list(type=df$year),sum)
vol_sumy
weight_sumy <- aggregate(df$weight_sum, by=list(type=df$year),sum)
df_year = data.frame(year = vol_sumy$type,vol_sumy = vol_sumy$x,
weight_sumy = weight_sumy$x)
df_year$weight_mean <- df_year$weight_sumy/df_year$vol_sumy
df_year
return (df_year)
}
#author :Cai
mass_quarter <- quarter_data(mass)
food_quarter <- quarter_data(food)
drug_quarter <- quarter_data(drug)
all_quarter <- quarter_data(all)
mass_quarter
## quarter vol_sum weight_sum weight_mean
## 1 2004-Q3 239657.7 167529.6 0.6990370
## 2 2004-Q4 370695.0 697060.5 1.8804151
## 3 2005-Q1 213694.1 249080.0 1.1655912
## 4 2005-Q2 333966.1 291528.5 0.8729285
## 5 2005-Q3 435405.3 347165.7 0.7973392
## 6 2005-Q4 393375.8 634279.5 1.6124008
## 7 2006-Q1 299799.4 304407.1 1.0153693
## 8 2006-Q2 430553.8 382246.4 0.8878019
## 9 2006-Q3 593808.7 461821.7 0.7777281
## 10 2006-Q4 691742.0 1100523.8 1.5909455
## 11 2007-Q1 466059.9 427720.0 0.9177361
## 12 2007-Q2 500516.1 513774.3 1.0264889
## 13 2007-Q3 485925.4 540260.2 1.1118173
## 14 2007-Q4 464796.1 905773.1 1.9487535
## 15 2008-Q1 358078.3 491064.1 1.3713874
## 16 2008-Q2 501671.6 586683.1 1.1694566
## 17 2008-Q3 375360.2 510172.5 1.3591544
## 18 2008-Q4 438508.1 829393.7 1.8913989
## 19 2009-Q1 201172.7 349685.3 1.7382343
## 20 2009-Q2 283209.2 450824.2 1.5918418
food_quarter
## quarter vol_sum weight_sum weight_mean
## 1 2004-Q3 45254119 65338783 1.443820
## 2 2004-Q4 53708734 114192126 2.126137
## 3 2005-Q1 40131741 58267757 1.451912
## 4 2005-Q2 40838462 58920608 1.442772
## 5 2005-Q3 43427850 60045958 1.382660
## 6 2005-Q4 53842154 115550223 2.146092
## 7 2006-Q1 42313931 60169812 1.421986
## 8 2006-Q2 41319440 64407806 1.558777
## 9 2006-Q3 40190757 66527202 1.655286
## 10 2006-Q4 48449256 118508465 2.446033
## 11 2007-Q1 34878405 60592685 1.737255
## 12 2007-Q2 37263025 64619715 1.734151
## 13 2007-Q3 37785439 64805939 1.715103
## 14 2007-Q4 46781096 113948259 2.435776
## 15 2008-Q1 33592958 61586845 1.833326
## 16 2008-Q2 34784510 64242449 1.846869
## 17 2008-Q3 34917310 64928777 1.859501
## 18 2008-Q4 43336629 119913144 2.767016
## 19 2009-Q1 34253703 65968498 1.925879
## 20 2009-Q2 37599772 69647089 1.852327
drug_quarter
## quarter vol_sum weight_sum weight_mean
## 1 2004-Q3 1049291.9 939074.4 0.8949601
## 2 2004-Q4 1153810.6 1219256.0 1.0567211
## 3 2005-Q1 1109632.6 1025841.4 0.9244874
## 4 2005-Q2 1045862.2 927818.8 0.8871329
## 5 2005-Q3 950410.0 857156.0 0.9018803
## 6 2005-Q4 1091571.5 1154029.2 1.0572182
## 7 2006-Q1 1031056.5 987075.1 0.9573433
## 8 2006-Q2 1037414.2 943022.7 0.9090127
## 9 2006-Q3 993771.1 955330.1 0.9613181
## 10 2006-Q4 999417.4 1170064.1 1.1707461
## 11 2007-Q1 954168.2 992351.1 1.0400169
## 12 2007-Q2 1006906.0 1037062.4 1.0299496
## 13 2007-Q3 957705.6 998126.3 1.0422058
## 14 2007-Q4 1069543.0 1252420.9 1.1709869
## 15 2008-Q1 1101923.9 1152598.1 1.0459871
## 16 2008-Q2 997238.3 1089123.3 1.0921395
## 17 2008-Q3 932400.2 1033075.2 1.1079740
## 18 2008-Q4 973827.6 1321109.6 1.3566155
## 19 2009-Q1 921840.3 1141290.5 1.2380566
## 20 2009-Q2 890509.2 1057959.7 1.1880390
all_quarter
## quarter vol_sum weight_sum weight_mean
## 1 2004-Q3 45493777 65506313 1.439896
## 2 2004-Q4 54079429 114889187 2.124453
## 3 2005-Q1 40345435 58516837 1.450396
## 4 2005-Q2 41172428 59212136 1.438150
## 5 2005-Q3 43863256 60393124 1.376850
## 6 2005-Q4 54235530 116184502 2.142221
## 7 2006-Q1 42613731 60474219 1.419125
## 8 2006-Q2 41749994 64790052 1.551858
## 9 2006-Q3 40784565 66989024 1.642509
## 10 2006-Q4 49140998 119608989 2.433996
## 11 2007-Q1 35344465 61020405 1.726449
## 12 2007-Q2 37763541 65133489 1.724772
## 13 2007-Q3 38271365 65346199 1.707444
## 14 2007-Q4 47245892 114854032 2.430984
## 15 2008-Q1 33951036 62077909 1.828454
## 16 2008-Q2 35286182 64829133 1.837239
## 17 2008-Q3 35292671 65438950 1.854180
## 18 2008-Q4 43775137 120742537 2.758245
## 19 2009-Q1 34454876 66318184 1.924784
## 20 2009-Q2 37882981 70097913 1.850380
mass_year <- year_data(mass)
food_year <- year_data(food)
drug_year <- year_data(drug)
all_year <- year_data(all)
mass_year
## year vol_sumy weight_sumy weight_mean
## 1 2004 610352.7 864590.1 1.416542
## 2 2005 1376441.3 1522053.7 1.105789
## 3 2006 2015903.9 2248999.1 1.115628
## 4 2007 1917297.6 2387527.6 1.245257
## 5 2008 1673618.2 2417313.4 1.444364
## 6 2009 484381.9 800509.5 1.652641
food_year
## year vol_sumy weight_sumy weight_mean
## 1 2004 98962853 179530909 1.814124
## 2 2005 178240207 292784546 1.642640
## 3 2006 172273384 309613285 1.797221
## 4 2007 156707966 303966598 1.939701
## 5 2008 146631408 310671215 2.118722
## 6 2009 71853475 135615587 1.887391
drug_year
## year vol_sumy weight_sumy weight_mean
## 1 2004 2203102 2158330 0.9796777
## 2 2005 4197476 3964845 0.9445784
## 3 2006 4061659 4055492 0.9984816
## 4 2007 3988323 4279961 1.0731230
## 5 2008 4005390 4595906 1.1474304
## 6 2009 1812350 2199250 1.2134801
all_year
## year vol_sumy weight_sumy weight_mean
## 1 2004 99573206 180395499 1.811687
## 2 2005 179616648 294306600 1.638526
## 3 2006 174289288 311862284 1.789337
## 4 2007 158625263 306354125 1.931307
## 5 2008 148305026 313088528 2.111112
## 6 2009 72337857 136416097 1.885819
all = sum(all$weight_sum)
#author :Cai
# Check for missing values
sum(is.na(mass))
## [1] 0
sum(is.na(food))
## [1] 0
sum(is.na(drug))
## [1] 0
#author :Cai
year_plot <- ggplot(all_year)
x = c(1:nrow(all_year))
year_plot + geom_line(aes(x=x,y=all_year[,3]),color="red") +
geom_line(aes(x=x,y=mass_year[,3]),color="blue") +
geom_line(aes(x=x,y=food_year[,3]),color="green") +
geom_line(aes(x=x,y=drug_year[,3]),color="skyblue") +
scale_x_continuous(label = function(x){return(all_year[x,1])})
mass
## Month VOLUMN PRICE weight_sum
## 2 2004_Jul 91942.57 0.540062 49654.69
## 3 2004_Aug 80327.29 0.642281 51592.69
## 4 2004_Sep 67387.86 0.983593 66282.22
## 5 2004_Oct 81307.07 1.326053 107817.49
## 6 2004_Nov 172584.29 2.154180 371777.62
## 7 2004_Dec 116803.64 1.861803 217465.37
## 8 2005_Jan 67193.46 1.111264 74669.68
## 9 2005_Feb 65111.32 1.109623 72249.02
## 10 2005_Mar 81389.36 1.255217 102161.30
## 11 2005_Apr 90105.43 0.967633 87188.99
## 12 2005_May 136057.89 0.785525 106876.88
## 13 2005_Jun 107802.75 0.904083 97462.63
## 14 2005_Jul 114619.79 0.855804 98092.07
## 15 2005_Aug 152440.71 0.796726 121453.48
## 16 2005_Sep 168344.79 0.758088 127620.16
## 17 2005_Oct 127759.21 0.991650 126692.42
## 18 2005_Nov 143462.07 2.007793 288042.14
## 19 2005_Dec 122154.54 1.797272 219544.93
## 20 2006_Jan 88740.25 1.008996 89538.56
## 21 2006_Feb 87689.43 1.055606 92565.49
## 22 2006_Mar 123369.75 0.991354 122303.10
## 23 2006_Apr 155312.04 0.856214 132980.34
## 24 2006_May 140064.93 0.854864 119736.47
## 25 2006_Jun 135176.79 0.958224 129529.64
## 26 2006_Jul 178455.57 0.723941 129191.30
## 27 2006_Aug 216618.21 0.771413 167102.11
## 28 2006_Sep 198734.93 0.832910 165528.31
## 29 2006_Oct 215120.07 0.956922 205853.13
## 30 2006_Nov 261636.86 1.912054 500263.80
## 31 2006_Dec 214985.07 1.834578 394406.88
## 32 2007_Jan 149417.79 0.983257 146916.08
## 33 2007_Feb 141932.71 0.898412 127514.05
## 34 2007_Mar 174709.43 0.877399 153289.88
## 35 2007_Apr 169455.00 0.946082 160318.33
## 36 2007_May 168988.82 0.948443 160276.26
## 37 2007_Jun 162072.32 1.191935 193179.67
## 38 2007_Jul 137077.07 1.318379 180719.53
## 39 2007_Aug 167178.39 1.047446 175110.34
## 40 2007_Sep 181669.89 1.015195 184430.37
## 41 2007_Oct 165415.43 1.314417 217424.85
## 42 2007_Nov 173307.93 2.376205 411815.17
## 43 2007_Dec 126072.79 2.193440 276533.09
## 44 2008_Jan 105059.14 1.443777 151681.97
## 45 2008_Feb 149494.57 1.077497 161079.95
## 46 2008_Mar 103524.57 1.722317 178302.13
## 47 2008_Apr 174153.64 1.104731 192392.93
## 48 2008_May 197059.00 1.033892 203737.72
## 49 2008_Jun 130458.93 1.460632 190552.49
## 50 2008_Jul 133018.07 1.309694 174212.97
## 51 2008_Aug 146845.21 1.248834 183385.30
## 52 2008_Sep 95496.96 1.597687 152574.26
## 53 2008_Oct 78102.11 1.838252 143571.35
## 54 2008_Nov 215058.29 1.976768 425120.34
## 55 2008_Dec 145347.68 1.793644 260701.99
## 56 2009_Jan 79226.18 1.655723 131176.61
## 57 2009_Feb 53561.14 1.751621 93818.82
## 58 2009_Mar 68385.39 1.823341 124689.89
## 59 2009_Apr 87508.82 1.696780 148483.22
## 60 2009_May 87781.79 1.627419 142857.75
## 61 2009_Jun 107918.57 1.477811 159483.25
#get month data
#author :Du
mass_month <- data.frame(mass$Month,mass$VOLUMN,mass$PRICE,mass$weight_sum)
mass_month
## mass.Month mass.VOLUMN mass.PRICE mass.weight_sum
## 1 2004_Jul 91942.57 0.540062 49654.69
## 2 2004_Aug 80327.29 0.642281 51592.69
## 3 2004_Sep 67387.86 0.983593 66282.22
## 4 2004_Oct 81307.07 1.326053 107817.49
## 5 2004_Nov 172584.29 2.154180 371777.62
## 6 2004_Dec 116803.64 1.861803 217465.37
## 7 2005_Jan 67193.46 1.111264 74669.68
## 8 2005_Feb 65111.32 1.109623 72249.02
## 9 2005_Mar 81389.36 1.255217 102161.30
## 10 2005_Apr 90105.43 0.967633 87188.99
## 11 2005_May 136057.89 0.785525 106876.88
## 12 2005_Jun 107802.75 0.904083 97462.63
## 13 2005_Jul 114619.79 0.855804 98092.07
## 14 2005_Aug 152440.71 0.796726 121453.48
## 15 2005_Sep 168344.79 0.758088 127620.16
## 16 2005_Oct 127759.21 0.991650 126692.42
## 17 2005_Nov 143462.07 2.007793 288042.14
## 18 2005_Dec 122154.54 1.797272 219544.93
## 19 2006_Jan 88740.25 1.008996 89538.56
## 20 2006_Feb 87689.43 1.055606 92565.49
## 21 2006_Mar 123369.75 0.991354 122303.10
## 22 2006_Apr 155312.04 0.856214 132980.34
## 23 2006_May 140064.93 0.854864 119736.47
## 24 2006_Jun 135176.79 0.958224 129529.64
## 25 2006_Jul 178455.57 0.723941 129191.30
## 26 2006_Aug 216618.21 0.771413 167102.11
## 27 2006_Sep 198734.93 0.832910 165528.31
## 28 2006_Oct 215120.07 0.956922 205853.13
## 29 2006_Nov 261636.86 1.912054 500263.80
## 30 2006_Dec 214985.07 1.834578 394406.88
## 31 2007_Jan 149417.79 0.983257 146916.08
## 32 2007_Feb 141932.71 0.898412 127514.05
## 33 2007_Mar 174709.43 0.877399 153289.88
## 34 2007_Apr 169455.00 0.946082 160318.33
## 35 2007_May 168988.82 0.948443 160276.26
## 36 2007_Jun 162072.32 1.191935 193179.67
## 37 2007_Jul 137077.07 1.318379 180719.53
## 38 2007_Aug 167178.39 1.047446 175110.34
## 39 2007_Sep 181669.89 1.015195 184430.37
## 40 2007_Oct 165415.43 1.314417 217424.85
## 41 2007_Nov 173307.93 2.376205 411815.17
## 42 2007_Dec 126072.79 2.193440 276533.09
## 43 2008_Jan 105059.14 1.443777 151681.97
## 44 2008_Feb 149494.57 1.077497 161079.95
## 45 2008_Mar 103524.57 1.722317 178302.13
## 46 2008_Apr 174153.64 1.104731 192392.93
## 47 2008_May 197059.00 1.033892 203737.72
## 48 2008_Jun 130458.93 1.460632 190552.49
## 49 2008_Jul 133018.07 1.309694 174212.97
## 50 2008_Aug 146845.21 1.248834 183385.30
## 51 2008_Sep 95496.96 1.597687 152574.26
## 52 2008_Oct 78102.11 1.838252 143571.35
## 53 2008_Nov 215058.29 1.976768 425120.34
## 54 2008_Dec 145347.68 1.793644 260701.99
## 55 2009_Jan 79226.18 1.655723 131176.61
## 56 2009_Feb 53561.14 1.751621 93818.82
## 57 2009_Mar 68385.39 1.823341 124689.89
## 58 2009_Apr 87508.82 1.696780 148483.22
## 59 2009_May 87781.79 1.627419 142857.75
## 60 2009_Jun 107918.57 1.477811 159483.25
food_month <- data.frame(food$Month,food$VOLUMN,food$PRICE,food$weight_sum)
food_month
## food.Month food.VOLUMN food.PRICE food.weight_sum
## 1 2004_Jul 14580047 1.492053 21754203
## 2 2004_Aug 14809138 1.425275 21107094
## 3 2004_Sep 15864934 1.416803 22477486
## 4 2004_Oct 17125286 1.565595 26811262
## 5 2004_Nov 20660633 2.405971 49708883
## 6 2004_Dec 15922816 2.365912 37671981
## 7 2005_Jan 12963245 1.482576 19218996
## 8 2005_Feb 12795191 1.398810 17898041
## 9 2005_Mar 14373304 1.471528 21150720
## 10 2005_Apr 13649767 1.397770 19079235
## 11 2005_May 13877405 1.423743 19757858
## 12 2005_Jun 13311289 1.508758 20083514
## 13 2005_Jul 13453651 1.475236 19847310
## 14 2005_Aug 14053983 1.383799 19447887
## 15 2005_Sep 15920217 1.303422 20750761
## 16 2005_Oct 16409994 1.581861 25958330
## 17 2005_Nov 21044428 2.453094 51623960
## 18 2005_Dec 16387732 2.316851 37967933
## 19 2006_Jan 13403200 1.484444 19896300
## 20 2006_Feb 13562728 1.382346 18748382
## 21 2006_Mar 15348004 1.402471 21525130
## 22 2006_Apr 14298923 1.517994 21705679
## 23 2006_May 13635635 1.535130 20932473
## 24 2006_Jun 13384882 1.626436 21769653
## 25 2006_Jul 13270426 1.632979 21670327
## 26 2006_Aug 13210034 1.661734 21951563
## 27 2006_Sep 13710296 1.670665 22905312
## 28 2006_Oct 14707820 1.877999 27621271
## 29 2006_Nov 19268179 2.801139 53972847
## 30 2006_Dec 14473257 2.550521 36914347
## 31 2007_Jan 11343783 1.807471 20503559
## 32 2007_Feb 11147826 1.688711 18825456
## 33 2007_Mar 12386796 1.716640 21263670
## 34 2007_Apr 11951513 1.785742 21342318
## 35 2007_May 12634818 1.681302 21242945
## 36 2007_Jun 12676694 1.738186 22034452
## 37 2007_Jul 12036229 1.776372 21380821
## 38 2007_Aug 12535094 1.705026 21372660
## 39 2007_Sep 13214117 1.668856 22052458
## 40 2007_Oct 14616720 1.807921 26425874
## 41 2007_Nov 18908816 2.837957 53662408
## 42 2007_Dec 13255560 2.554398 33859977
## 43 2008_Jan 11116519 1.852541 20593807
## 44 2008_Feb 10703282 1.802626 19294015
## 45 2008_Mar 11773157 1.843093 21699023
## 46 2008_Apr 11475396 1.776068 20381083
## 47 2008_May 11807544 1.835863 21677033
## 48 2008_Jun 11501571 1.928809 22184333
## 49 2008_Jul 11730317 1.828197 21445330
## 50 2008_Aug 11499209 1.869837 21501646
## 51 2008_Sep 11687785 1.880750 21981801
## 52 2008_Oct 12939811 2.137677 27661136
## 53 2008_Nov 17506533 3.144949 55057153
## 54 2008_Dec 12890286 2.885495 37194855
## 55 2009_Jan 10582184 2.120130 22435606
## 56 2009_Feb 10634505 1.839275 19559780
## 57 2009_Mar 13037014 1.838850 23973112
## 58 2009_Apr 12569873 1.840990 23141011
## 59 2009_May 12705934 1.802271 22899537
## 60 2009_Jun 12323965 1.915499 23606542
drug_month <- data.frame(drug$Month,drug$VOLUMN,drug$PRICE,drug$weight_sum)
drug_month
## drug.Month drug.VOLUMN drug.PRICE drug.weight_sum
## 1 2004_Jul 342038.9 0.915870 313263.2
## 2 2004_Aug 353868.0 0.878705 310945.5
## 3 2004_Sep 353385.0 0.890999 314865.7
## 4 2004_Oct 382660.5 0.918136 351334.4
## 5 2004_Nov 396199.8 1.161077 460018.5
## 6 2004_Dec 374950.2 1.087886 407903.1
## 7 2005_Jan 373119.7 0.907105 338458.8
## 8 2005_Feb 363302.8 0.950365 345270.2
## 9 2005_Mar 373210.1 0.916675 342112.4
## 10 2005_Apr 356498.6 0.898380 320271.2
## 11 2005_May 373107.9 0.880589 328554.7
## 12 2005_Jun 316255.7 0.882175 278992.9
## 13 2005_Jul 309310.4 0.904569 279792.6
## 14 2005_Aug 318453.2 0.913966 291055.4
## 15 2005_Sep 322646.4 0.887374 286308.0
## 16 2005_Oct 362755.7 0.968978 351502.3
## 17 2005_Nov 376950.7 1.144221 431314.9
## 18 2005_Dec 351865.1 1.054984 371212.0
## 19 2006_Jan 338764.8 0.984137 333391.0
## 20 2006_Feb 325562.7 0.961842 313139.9
## 21 2006_Mar 366729.0 0.928599 340544.2
## 22 2006_Apr 353113.2 0.900744 318064.6
## 23 2006_May 355834.2 0.914624 325454.5
## 24 2006_Jun 328466.9 0.911823 299503.6
## 25 2006_Jul 317527.0 0.955292 303331.0
## 26 2006_Aug 330962.4 0.966118 319748.8
## 27 2006_Sep 345281.6 0.962259 332250.4
## 28 2006_Oct 344282.4 1.080196 371892.5
## 29 2006_Nov 339512.9 1.278718 434141.2
## 30 2006_Dec 315622.1 1.153374 364030.4
## 31 2007_Jan 303314.6 1.054472 319836.8
## 32 2007_Feb 307557.4 1.037891 319211.0
## 33 2007_Mar 343296.2 1.029150 353303.3
## 34 2007_Apr 337749.6 1.018466 343986.5
## 35 2007_May 349303.3 1.032396 360619.3
## 36 2007_Jun 319853.0 1.039404 332456.5
## 37 2007_Jul 308596.2 1.046101 322822.8
## 38 2007_Aug 312512.3 1.045203 326638.8
## 39 2007_Sep 336597.0 1.035852 348664.7
## 40 2007_Oct 364338.8 1.090308 397241.5
## 41 2007_Nov 362315.0 1.283790 465136.4
## 42 2007_Dec 342889.2 1.137519 390043.0
## 43 2008_Jan 339879.9 1.109045 376942.1
## 44 2008_Feb 355203.6 1.048227 372334.0
## 45 2008_Mar 406840.4 0.991352 403322.0
## 46 2008_Apr 332608.3 1.087043 361559.5
## 47 2008_May 350392.2 1.101599 385991.7
## 48 2008_Jun 314237.8 1.086986 341572.1
## 49 2008_Jul 310672.0 1.069949 332403.2
## 50 2008_Aug 301160.8 1.122488 338049.3
## 51 2008_Sep 320567.5 1.131190 362622.8
## 52 2008_Oct 324321.3 1.210744 392670.1
## 53 2008_Nov 328502.6 1.538714 505471.6
## 54 2008_Dec 321003.6 1.317642 422967.9
## 55 2009_Jan 307358.9 1.252852 385075.2
## 56 2009_Feb 288944.6 1.235202 356904.9
## 57 2009_Mar 325536.9 1.226621 399310.4
## 58 2009_Apr 316638.4 1.161370 367734.3
## 59 2009_May 299196.1 1.196255 357914.9
## 60 2009_Jun 274674.6 1.209833 332310.4
#author :Du
#mass
#Generate time series objects
ts_mass_month <- ts(mass_month$mass.weight_sum,start = c(2004,6),frequency = 12)
fit_mass <- stl(ts_mass_month,s.window = 'period')
plot(fit_mass)
fit_mass %>% forecast(method="naive") %>% autoplot() + ylab("sales")+
theme(text = element_text(family = "STHeiti"))+
theme(plot.title = element_text(hjust = 0.5))
#food
#Generate time series objects
ts_food_month <- ts(food_month$food.weight_sum,start = c(2004,6),frequency = 12)
fit_food <- stl(ts_food_month,s.window = 'period')
plot(fit_food)
fit_food %>% forecast(method="naive") %>% autoplot() + ylab("sales")+
theme(text = element_text(family = "STHeiti"))+
theme(plot.title = element_text(hjust = 0.5))
#drug
#Generate time series objects
ts_drug_month <- ts(drug_month$drug.weight_sum,start = c(2004,6),frequency = 12)
fit_drug <- stl(ts_drug_month,s.window = 'period')
plot(fit_drug)
fit_drug %>% forecast(method="naive") %>% autoplot() + ylab("sales")+
theme(text = element_text(family = "STHeiti"))+
theme(plot.title = element_text(hjust = 0.5))
#author :Du
#mass
fit_mass %>% forecast(h=36) %>%
autoplot() +
xlab("time") +
ylab("sales")+
ggtitle('mass cake predict') +
theme(text = element_text(family = "STHeiti"))+
theme(plot.title = element_text(hjust = 0.5))
#food
fit_food %>% forecast(h=36) %>%
autoplot() +
xlab("time") +
ylab("sales")+
ggtitle('food cake predict') +
theme(text = element_text(family = "STHeiti"))+
theme(plot.title = element_text(hjust = 0.5))
#drug
fit_drug %>% forecast(h=36) %>%
autoplot() +
xlab("time") +
ylab("sales")+
ggtitle('drug cake predict') +
theme(text = element_text(family = "STHeiti"))+
theme(plot.title = element_text(hjust = 0.5))
#author :Liao
data_mass_quarter=ts(mass_quarter$weight_sum,frequency=4,start=2004,end=2009)
data=data_mass_quarter
plot(data)
ndiffs(data)
## [1] 0
ddata <- diff(data)
plot(ddata)
ADF<-adf.test(ddata)
## Warning in adf.test(ddata): p-value smaller than printed p-value
ADF
##
## Augmented Dickey-Fuller Test
##
## data: ddata
## Dickey-Fuller = -9.5978, Lag order = 2, p-value = 0.01
## alternative hypothesis: stationary
#####2.Model Sizing and Fitting
# author: Liao
fit <- auto.arima(data)
fit
## Series: data
## ARIMA(0,1,0)(1,1,0)[4]
##
## Coefficients:
## sar1
## -0.5622
## s.e. 0.1837
##
## sigma^2 = 2.137e+10: log likelihood = -213.23
## AIC=430.46 AICc=431.38 BIC=432
accuracy(fit)
## ME RMSE MAE MPE MAPE MASE
## Training set -29944.76 123545.7 83392.11 -9.999367 20.49734 0.6007077
## ACF1
## Training set -0.2865116
#####3.Model diagnosis
# author: Liao
qqnorm(fit$residuals) #plot
qqline(fit$residuals) #add line
Box.test(fit$residuals, type="Ljung-Box")
##
## Box-Ljung test
##
## data: fit$residuals
## X-squared = 1.9824, df = 1, p-value = 0.1591
#Residual test, significant: residuals are not smooth p-value greater than 0.05 Not suitable
#author:Wang
mass_month <- data.frame(mass$Month,mass$VOLUMN,mass$PRICE,mass$weight_sum)
mass_month
## mass.Month mass.VOLUMN mass.PRICE mass.weight_sum
## 1 2004_Jul 91942.57 0.540062 49654.69
## 2 2004_Aug 80327.29 0.642281 51592.69
## 3 2004_Sep 67387.86 0.983593 66282.22
## 4 2004_Oct 81307.07 1.326053 107817.49
## 5 2004_Nov 172584.29 2.154180 371777.62
## 6 2004_Dec 116803.64 1.861803 217465.37
## 7 2005_Jan 67193.46 1.111264 74669.68
## 8 2005_Feb 65111.32 1.109623 72249.02
## 9 2005_Mar 81389.36 1.255217 102161.30
## 10 2005_Apr 90105.43 0.967633 87188.99
## 11 2005_May 136057.89 0.785525 106876.88
## 12 2005_Jun 107802.75 0.904083 97462.63
## 13 2005_Jul 114619.79 0.855804 98092.07
## 14 2005_Aug 152440.71 0.796726 121453.48
## 15 2005_Sep 168344.79 0.758088 127620.16
## 16 2005_Oct 127759.21 0.991650 126692.42
## 17 2005_Nov 143462.07 2.007793 288042.14
## 18 2005_Dec 122154.54 1.797272 219544.93
## 19 2006_Jan 88740.25 1.008996 89538.56
## 20 2006_Feb 87689.43 1.055606 92565.49
## 21 2006_Mar 123369.75 0.991354 122303.10
## 22 2006_Apr 155312.04 0.856214 132980.34
## 23 2006_May 140064.93 0.854864 119736.47
## 24 2006_Jun 135176.79 0.958224 129529.64
## 25 2006_Jul 178455.57 0.723941 129191.30
## 26 2006_Aug 216618.21 0.771413 167102.11
## 27 2006_Sep 198734.93 0.832910 165528.31
## 28 2006_Oct 215120.07 0.956922 205853.13
## 29 2006_Nov 261636.86 1.912054 500263.80
## 30 2006_Dec 214985.07 1.834578 394406.88
## 31 2007_Jan 149417.79 0.983257 146916.08
## 32 2007_Feb 141932.71 0.898412 127514.05
## 33 2007_Mar 174709.43 0.877399 153289.88
## 34 2007_Apr 169455.00 0.946082 160318.33
## 35 2007_May 168988.82 0.948443 160276.26
## 36 2007_Jun 162072.32 1.191935 193179.67
## 37 2007_Jul 137077.07 1.318379 180719.53
## 38 2007_Aug 167178.39 1.047446 175110.34
## 39 2007_Sep 181669.89 1.015195 184430.37
## 40 2007_Oct 165415.43 1.314417 217424.85
## 41 2007_Nov 173307.93 2.376205 411815.17
## 42 2007_Dec 126072.79 2.193440 276533.09
## 43 2008_Jan 105059.14 1.443777 151681.97
## 44 2008_Feb 149494.57 1.077497 161079.95
## 45 2008_Mar 103524.57 1.722317 178302.13
## 46 2008_Apr 174153.64 1.104731 192392.93
## 47 2008_May 197059.00 1.033892 203737.72
## 48 2008_Jun 130458.93 1.460632 190552.49
## 49 2008_Jul 133018.07 1.309694 174212.97
## 50 2008_Aug 146845.21 1.248834 183385.30
## 51 2008_Sep 95496.96 1.597687 152574.26
## 52 2008_Oct 78102.11 1.838252 143571.35
## 53 2008_Nov 215058.29 1.976768 425120.34
## 54 2008_Dec 145347.68 1.793644 260701.99
## 55 2009_Jan 79226.18 1.655723 131176.61
## 56 2009_Feb 53561.14 1.751621 93818.82
## 57 2009_Mar 68385.39 1.823341 124689.89
## 58 2009_Apr 87508.82 1.696780 148483.22
## 59 2009_May 87781.79 1.627419 142857.75
## 60 2009_Jun 107918.57 1.477811 159483.25
food_month <- data.frame(food$Month,food$VOLUMN,food$PRICE,food$weight_sum)
food_month
## food.Month food.VOLUMN food.PRICE food.weight_sum
## 1 2004_Jul 14580047 1.492053 21754203
## 2 2004_Aug 14809138 1.425275 21107094
## 3 2004_Sep 15864934 1.416803 22477486
## 4 2004_Oct 17125286 1.565595 26811262
## 5 2004_Nov 20660633 2.405971 49708883
## 6 2004_Dec 15922816 2.365912 37671981
## 7 2005_Jan 12963245 1.482576 19218996
## 8 2005_Feb 12795191 1.398810 17898041
## 9 2005_Mar 14373304 1.471528 21150720
## 10 2005_Apr 13649767 1.397770 19079235
## 11 2005_May 13877405 1.423743 19757858
## 12 2005_Jun 13311289 1.508758 20083514
## 13 2005_Jul 13453651 1.475236 19847310
## 14 2005_Aug 14053983 1.383799 19447887
## 15 2005_Sep 15920217 1.303422 20750761
## 16 2005_Oct 16409994 1.581861 25958330
## 17 2005_Nov 21044428 2.453094 51623960
## 18 2005_Dec 16387732 2.316851 37967933
## 19 2006_Jan 13403200 1.484444 19896300
## 20 2006_Feb 13562728 1.382346 18748382
## 21 2006_Mar 15348004 1.402471 21525130
## 22 2006_Apr 14298923 1.517994 21705679
## 23 2006_May 13635635 1.535130 20932473
## 24 2006_Jun 13384882 1.626436 21769653
## 25 2006_Jul 13270426 1.632979 21670327
## 26 2006_Aug 13210034 1.661734 21951563
## 27 2006_Sep 13710296 1.670665 22905312
## 28 2006_Oct 14707820 1.877999 27621271
## 29 2006_Nov 19268179 2.801139 53972847
## 30 2006_Dec 14473257 2.550521 36914347
## 31 2007_Jan 11343783 1.807471 20503559
## 32 2007_Feb 11147826 1.688711 18825456
## 33 2007_Mar 12386796 1.716640 21263670
## 34 2007_Apr 11951513 1.785742 21342318
## 35 2007_May 12634818 1.681302 21242945
## 36 2007_Jun 12676694 1.738186 22034452
## 37 2007_Jul 12036229 1.776372 21380821
## 38 2007_Aug 12535094 1.705026 21372660
## 39 2007_Sep 13214117 1.668856 22052458
## 40 2007_Oct 14616720 1.807921 26425874
## 41 2007_Nov 18908816 2.837957 53662408
## 42 2007_Dec 13255560 2.554398 33859977
## 43 2008_Jan 11116519 1.852541 20593807
## 44 2008_Feb 10703282 1.802626 19294015
## 45 2008_Mar 11773157 1.843093 21699023
## 46 2008_Apr 11475396 1.776068 20381083
## 47 2008_May 11807544 1.835863 21677033
## 48 2008_Jun 11501571 1.928809 22184333
## 49 2008_Jul 11730317 1.828197 21445330
## 50 2008_Aug 11499209 1.869837 21501646
## 51 2008_Sep 11687785 1.880750 21981801
## 52 2008_Oct 12939811 2.137677 27661136
## 53 2008_Nov 17506533 3.144949 55057153
## 54 2008_Dec 12890286 2.885495 37194855
## 55 2009_Jan 10582184 2.120130 22435606
## 56 2009_Feb 10634505 1.839275 19559780
## 57 2009_Mar 13037014 1.838850 23973112
## 58 2009_Apr 12569873 1.840990 23141011
## 59 2009_May 12705934 1.802271 22899537
## 60 2009_Jun 12323965 1.915499 23606542
drug_month <- data.frame(drug$Month,drug$VOLUMN,drug$PRICE,drug$weight_sum)
drug_month
## drug.Month drug.VOLUMN drug.PRICE drug.weight_sum
## 1 2004_Jul 342038.9 0.915870 313263.2
## 2 2004_Aug 353868.0 0.878705 310945.5
## 3 2004_Sep 353385.0 0.890999 314865.7
## 4 2004_Oct 382660.5 0.918136 351334.4
## 5 2004_Nov 396199.8 1.161077 460018.5
## 6 2004_Dec 374950.2 1.087886 407903.1
## 7 2005_Jan 373119.7 0.907105 338458.8
## 8 2005_Feb 363302.8 0.950365 345270.2
## 9 2005_Mar 373210.1 0.916675 342112.4
## 10 2005_Apr 356498.6 0.898380 320271.2
## 11 2005_May 373107.9 0.880589 328554.7
## 12 2005_Jun 316255.7 0.882175 278992.9
## 13 2005_Jul 309310.4 0.904569 279792.6
## 14 2005_Aug 318453.2 0.913966 291055.4
## 15 2005_Sep 322646.4 0.887374 286308.0
## 16 2005_Oct 362755.7 0.968978 351502.3
## 17 2005_Nov 376950.7 1.144221 431314.9
## 18 2005_Dec 351865.1 1.054984 371212.0
## 19 2006_Jan 338764.8 0.984137 333391.0
## 20 2006_Feb 325562.7 0.961842 313139.9
## 21 2006_Mar 366729.0 0.928599 340544.2
## 22 2006_Apr 353113.2 0.900744 318064.6
## 23 2006_May 355834.2 0.914624 325454.5
## 24 2006_Jun 328466.9 0.911823 299503.6
## 25 2006_Jul 317527.0 0.955292 303331.0
## 26 2006_Aug 330962.4 0.966118 319748.8
## 27 2006_Sep 345281.6 0.962259 332250.4
## 28 2006_Oct 344282.4 1.080196 371892.5
## 29 2006_Nov 339512.9 1.278718 434141.2
## 30 2006_Dec 315622.1 1.153374 364030.4
## 31 2007_Jan 303314.6 1.054472 319836.8
## 32 2007_Feb 307557.4 1.037891 319211.0
## 33 2007_Mar 343296.2 1.029150 353303.3
## 34 2007_Apr 337749.6 1.018466 343986.5
## 35 2007_May 349303.3 1.032396 360619.3
## 36 2007_Jun 319853.0 1.039404 332456.5
## 37 2007_Jul 308596.2 1.046101 322822.8
## 38 2007_Aug 312512.3 1.045203 326638.8
## 39 2007_Sep 336597.0 1.035852 348664.7
## 40 2007_Oct 364338.8 1.090308 397241.5
## 41 2007_Nov 362315.0 1.283790 465136.4
## 42 2007_Dec 342889.2 1.137519 390043.0
## 43 2008_Jan 339879.9 1.109045 376942.1
## 44 2008_Feb 355203.6 1.048227 372334.0
## 45 2008_Mar 406840.4 0.991352 403322.0
## 46 2008_Apr 332608.3 1.087043 361559.5
## 47 2008_May 350392.2 1.101599 385991.7
## 48 2008_Jun 314237.8 1.086986 341572.1
## 49 2008_Jul 310672.0 1.069949 332403.2
## 50 2008_Aug 301160.8 1.122488 338049.3
## 51 2008_Sep 320567.5 1.131190 362622.8
## 52 2008_Oct 324321.3 1.210744 392670.1
## 53 2008_Nov 328502.6 1.538714 505471.6
## 54 2008_Dec 321003.6 1.317642 422967.9
## 55 2009_Jan 307358.9 1.252852 385075.2
## 56 2009_Feb 288944.6 1.235202 356904.9
## 57 2009_Mar 325536.9 1.226621 399310.4
## 58 2009_Apr 316638.4 1.161370 367734.3
## 59 2009_May 299196.1 1.196255 357914.9
## 60 2009_Jun 274674.6 1.209833 332310.4
# Create monthly data time series
ts_mass_month <- ts(mass_month$mass.weight_sum,start = c(2004,6),frequency = 12)
# Draw a monthly data graph
autoplot(ts_mass_month)
# Forecasting monthly data using the Holt-Winters model
fc <- hw(subset(ts_mass_month,end=length(ts_mass_month)-35),
damped = TRUE, seasonal="multiplicative", h=35)
autoplot(ts_mass_month) +
autolayer(fc, series="HW multi damped", PI=FALSE)+
guides(colour=guide_legend(title="month forecasts"))
# Comparison of Holt-Winters Addition and Multiplication Methods for Monthly Data
aust <- window(ts_mass_month)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
autolayer(fit2, series="HW multiplicative forecasts",
PI=FALSE) +
xlab("Year") +
ylab("mass_month)") +
ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
guides(colour=guide_legend(title="Forecast"))
# Create quarterly data time series
ts_mass_quarter <- ts(mass_quarter$weight_sum,frequency=4,start=2004,end=2009)
# Draw quarterly data graphs
autoplot(ts_mass_quarter)
# Quarterly data forecast with Holt-Winters model
fc <- hw(subset(ts_mass_quarter,end=length(ts_mass_quarter)-10),
damped = TRUE, seasonal="multiplicative", h=35)
## Warning in ets(x, "MAM", alpha = alpha, beta = beta, gamma = gamma, phi = phi, :
## Not enough data to use damping
autoplot(ts_mass_quarter) +
autolayer(fc, series="HW multi damped", PI=FALSE)+
guides(colour=guide_legend(title="Daily forecasts"))
# Comparison of Holt-Winters Additive and Multiplicative Methods for Quarterly Data
aust <- window(ts_mass_quarter)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
autolayer(fit2, series="HW multiplicative forecasts",
PI=FALSE) +
xlab("Year") +
ylab("mass_quarter)") +
ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
guides(colour=guide_legend(title="Forecast"))
####HW-MODEL
# author:wang
### Food data
# Create monthly data time series
ts_food_month <- ts(food_month$food.weight_sum,start = c(2004,6),frequency = 12)
# Draw a monthly data graph
autoplot(ts_food_month)
# Forecasting monthly data using the Holt-Winters model
fc <- hw(subset(ts_food_month,end=length(ts_food_month)-35),
damped = TRUE, seasonal="multiplicative", h=35)
autoplot(ts_food_month) +
autolayer(fc, series="HW multi damped", PI=FALSE)+
guides(colour=guide_legend(title="month forecasts"))
# Comparison of Holt-Winters Addition and Multiplication Methods for Monthly Data
aust <- window(ts_food_month)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
autolayer(fit2, series="HW multiplicative forecasts",
PI=FALSE) +
xlab("Year") +
ylab("food_month)") +
ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
guides(colour=guide_legend(title="Forecast"))
# Create quarterly data time series
ts_food_quarter <- ts(food_quarter$weight_sum,frequency=4,start=2004,end=2009)
# Draw quarterly data graphs
autoplot(ts_food_quarter)
# Quarterly data forecast with Holt-Winters model
fc <- hw(subset(ts_food_quarter,end=length(ts_food_quarter)-10),
damped = TRUE, seasonal="multiplicative", h=35)
## Warning in ets(x, "MAM", alpha = alpha, beta = beta, gamma = gamma, phi = phi, :
## Not enough data to use damping
autoplot(ts_food_quarter) +
autolayer(fc, series="HW multi damped", PI=FALSE)+
guides(colour=guide_legend(title="Daily forecasts"))
# Comparison of Holt-Winters Additive and Multiplicative Methods for Quarterly Data
aust <- window(ts_food_quarter)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
autolayer(fit2, series="HW multiplicative forecasts",
PI=FALSE) +
xlab("Year") +
ylab("food_quarter)") +
ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
guides(colour=guide_legend(title="Forecast"))
####HW-MODEL
# author:wang
### Drug data
# Create monthly data time series
ts_drug_month <- ts(drug_month$drug.weight_sum,start = c(2004,6),frequency = 12)
# Draw a monthly data graph
autoplot(ts_drug_month)
# Forecasting monthly data using the Holt-Winters model
fc <- hw(subset(ts_drug_month,end=length(ts_drug_month)-35),
damped = TRUE, seasonal="multiplicative", h=35)
autoplot(ts_drug_month) +
autolayer(fc, series="HW multi damped", PI=FALSE)+
guides(colour=guide_legend(title="month forecasts"))
# Comparison of Holt-Winters Addition and Multiplication Methods for Monthly Data
aust <- window(ts_drug_month)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
autolayer(fit2, series="HW multiplicative forecasts",
PI=FALSE) +
xlab("Year") +
ylab("drug_month)") +
ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
guides(colour=guide_legend(title="Forecast"))
# Create quarterly data time series
ts_drug_quarter <- ts(drug_quarter$weight_sum,frequency=4,start=2004,end=2009)
# Draw quarterly data graphs
autoplot(ts_drug_quarter)
# Quarterly data forecast with Holt-Winters model
fc <- hw(subset(ts_drug_quarter,end=length(ts_drug_quarter)-10),
damped = TRUE, seasonal="multiplicative", h=35)
## Warning in ets(x, "MAM", alpha = alpha, beta = beta, gamma = gamma, phi = phi, :
## Not enough data to use damping
autoplot(ts_drug_quarter) +
autolayer(fc, series="HW multi damped", PI=FALSE)+
guides(colour=guide_legend(title="Daily forecasts"))
# Comparison of Holt-Winters Additive and Multiplicative Methods for Quarterly Data
aust <- window(ts_drug_quarter)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
autolayer(fit2, series="HW multiplicative forecasts",
PI=FALSE) +
xlab("Year") +
ylab("drug_quarter)") +
ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
guides(colour=guide_legend(title="Forecast"))
# author: Ding
# Fill in the missing months
mass_2004 <- filter(mass, substring(mass$Month, 1, 4) == 2004)
mass_2009 <- filter(mass, substring(mass$Month, 1, 4) == 2009)
food_2004 <- filter(food, substring(food$Month, 1, 4) == 2004)
food_2009 <- filter(food, substring(food$Month, 1, 4) == 2009)
drug_2004 <- filter(drug, substring(drug$Month, 1, 4) == 2004)
drug_2009 <- filter(drug, substring(drug$Month, 1, 4) == 2009)
exp04m <- sum(mass_2004$weight_sum)/6
exp04m
## [1] 144098.3
exp09m <- sum(mass_2009$weight_sum)/6
exp09m
## [1] 133418.3
exp04f <- sum(food_2004$weight_sum)/6
exp04f
## [1] 29921818
exp09f <- sum(food_2009$weight_sum)/6
exp09f
## [1] 22602598
exp04d <- sum(drug_2004$weight_sum)/6
exp04d
## [1] 359721.7
exp09d <- sum(drug_2009$weight_sum)/6
exp09d
## [1] 366541.7
mass_year[1,3]<-mass_year[1,3] + exp04m*6
mass_year[6,3]<-mass_year[6,3] + exp09m*6
train_mass <- mass_year
library(tsibble)
data_df_year_ts<-train_mass%>%
mutate(data = as.integer(year)) %>%
as_tsibble(index =data )
fit_trends <- data_df_year_ts %>%
model(
linear = TSLM(weight_sumy ~ trend()),
)
fc_trends <- fit_trends %>% forecast(h = 3)
data_df_year_ts %>%
autoplot(weight_sumy ) +
geom_line(data = fitted(fit_trends),
aes(y = .fitted, x= data, colour = .model)) +
autolayer(fc_trends, alpha = 0.5, level = 95) +
labs(y = "weight_sum",
title = "change mass_year of 3 year")
# food_year
food_year[1,3]<-food_year[1,3] + exp04f*6
food_year[6,3]<-food_year[6,3] + exp09f*6
train_mass2 <- food_year
library(tsibble)
data_df_year_ts2<-train_mass2%>%
mutate(data = as.integer(year)) %>%
as_tsibble(index =data )
fit_trends2 <- data_df_year_ts2 %>%
model(
linear = TSLM(weight_sumy ~ trend()),
)
fc_trends2 <- fit_trends2 %>% forecast(h = 3)
data_df_year_ts2 %>%
autoplot(weight_sumy ) +
geom_line(data = fitted(fit_trends2),
aes(y = .fitted,x= data, colour = .model)) +
autolayer(fc_trends2, alpha = 0.5, level = 95) +
labs(y = "weight_sum",
title = "change food_year of 3 year")
#drug_year
drug_year[1,3]<-drug_year[1,3] + exp04d*6
drug_year[6,3]<-drug_year[6,3] + exp09d*6
train_mass3 <- drug_year
library(tsibble)
data_df_year_ts3<-train_mass3%>%
mutate(data = as.integer(year)) %>%
as_tsibble(index =data )
fit_trends3 <- data_df_year_ts3 %>%
model(
linear = TSLM(weight_sumy ~ trend()),
)
fc_trends3 <- fit_trends3 %>% forecast(h = 3)
data_df_year_ts3 %>%
autoplot(weight_sumy ) +
geom_line(data = fitted(fit_trends3),
aes(y = .fitted,x= data, colour = .model)) +
autolayer(fc_trends3, alpha = 0.5, level = 95) +
labs(y = "weight_sum",
title = "change drug_year of 3 year")
Sys.setlocale('LC_TIME', 'C')
## [1] "C"
month <- mass$Month
month <- str_c('1_',month)
month <- as.Date(month,format='%d_%Y_%b')
mass
## Month VOLUMN PRICE weight_sum
## 2 2004_Jul 91942.57 0.540062 49654.69
## 3 2004_Aug 80327.29 0.642281 51592.69
## 4 2004_Sep 67387.86 0.983593 66282.22
## 5 2004_Oct 81307.07 1.326053 107817.49
## 6 2004_Nov 172584.29 2.154180 371777.62
## 7 2004_Dec 116803.64 1.861803 217465.37
## 8 2005_Jan 67193.46 1.111264 74669.68
## 9 2005_Feb 65111.32 1.109623 72249.02
## 10 2005_Mar 81389.36 1.255217 102161.30
## 11 2005_Apr 90105.43 0.967633 87188.99
## 12 2005_May 136057.89 0.785525 106876.88
## 13 2005_Jun 107802.75 0.904083 97462.63
## 14 2005_Jul 114619.79 0.855804 98092.07
## 15 2005_Aug 152440.71 0.796726 121453.48
## 16 2005_Sep 168344.79 0.758088 127620.16
## 17 2005_Oct 127759.21 0.991650 126692.42
## 18 2005_Nov 143462.07 2.007793 288042.14
## 19 2005_Dec 122154.54 1.797272 219544.93
## 20 2006_Jan 88740.25 1.008996 89538.56
## 21 2006_Feb 87689.43 1.055606 92565.49
## 22 2006_Mar 123369.75 0.991354 122303.10
## 23 2006_Apr 155312.04 0.856214 132980.34
## 24 2006_May 140064.93 0.854864 119736.47
## 25 2006_Jun 135176.79 0.958224 129529.64
## 26 2006_Jul 178455.57 0.723941 129191.30
## 27 2006_Aug 216618.21 0.771413 167102.11
## 28 2006_Sep 198734.93 0.832910 165528.31
## 29 2006_Oct 215120.07 0.956922 205853.13
## 30 2006_Nov 261636.86 1.912054 500263.80
## 31 2006_Dec 214985.07 1.834578 394406.88
## 32 2007_Jan 149417.79 0.983257 146916.08
## 33 2007_Feb 141932.71 0.898412 127514.05
## 34 2007_Mar 174709.43 0.877399 153289.88
## 35 2007_Apr 169455.00 0.946082 160318.33
## 36 2007_May 168988.82 0.948443 160276.26
## 37 2007_Jun 162072.32 1.191935 193179.67
## 38 2007_Jul 137077.07 1.318379 180719.53
## 39 2007_Aug 167178.39 1.047446 175110.34
## 40 2007_Sep 181669.89 1.015195 184430.37
## 41 2007_Oct 165415.43 1.314417 217424.85
## 42 2007_Nov 173307.93 2.376205 411815.17
## 43 2007_Dec 126072.79 2.193440 276533.09
## 44 2008_Jan 105059.14 1.443777 151681.97
## 45 2008_Feb 149494.57 1.077497 161079.95
## 46 2008_Mar 103524.57 1.722317 178302.13
## 47 2008_Apr 174153.64 1.104731 192392.93
## 48 2008_May 197059.00 1.033892 203737.72
## 49 2008_Jun 130458.93 1.460632 190552.49
## 50 2008_Jul 133018.07 1.309694 174212.97
## 51 2008_Aug 146845.21 1.248834 183385.30
## 52 2008_Sep 95496.96 1.597687 152574.26
## 53 2008_Oct 78102.11 1.838252 143571.35
## 54 2008_Nov 215058.29 1.976768 425120.34
## 55 2008_Dec 145347.68 1.793644 260701.99
## 56 2009_Jan 79226.18 1.655723 131176.61
## 57 2009_Feb 53561.14 1.751621 93818.82
## 58 2009_Mar 68385.39 1.823341 124689.89
## 59 2009_Apr 87508.82 1.696780 148483.22
## 60 2009_May 87781.79 1.627419 142857.75
## 61 2009_Jun 107918.57 1.477811 159483.25
mass$ds <- month
mass$y <-mass$weight_sum
p_mass <-data.frame(mass['ds'],mass['y'])
colnames(mass) <- c('ds','y','PRICE','weight_sum','date','sum')
# author Cai.
# mass forecast
r_mass <-data.frame(p_mass['ds'],p_mass['y'])
# Data visualisation
r_mass %>%
plot_time_series(ds,y)
# Split Data 80/20
splits <- initial_time_split(r_mass, prop = 0.9)
recipe_spec <- recipe(y ~ ds, training(splits)) %>%
step_timeseries_signature(ds) %>%
# step_fourier(date, period = 365, K = 5) %>%
step_dummy(all_nominal())
recipe_spec %>% prep() %>% juice()
## # A tibble: 54 × 44
## ds y ds_index.num ds_year ds_year.iso ds_half ds_quarter
## <date> <dbl> <dbl> <int> <int> <int> <int>
## 1 2004-07-01 49655. 1088640000 2004 2004 2 3
## 2 2004-08-01 51593. 1091318400 2004 2004 2 3
## 3 2004-09-01 66282. 1093996800 2004 2004 2 3
## 4 2004-10-01 107817. 1096588800 2004 2004 2 4
## 5 2004-11-01 371778. 1099267200 2004 2004 2 4
## 6 2004-12-01 217465. 1101859200 2004 2004 2 4
## 7 2005-01-01 74670. 1104537600 2005 2004 1 1
## 8 2005-02-01 72249. 1107216000 2005 2005 1 1
## 9 2005-03-01 102161. 1109635200 2005 2005 1 1
## 10 2005-04-01 87189. 1112313600 2005 2005 1 2
## # … with 44 more rows, and 37 more variables: ds_month <int>,
## # ds_month.xts <int>, ds_day <int>, ds_hour <int>, ds_minute <int>,
## # ds_second <int>, ds_hour12 <int>, ds_am.pm <int>, ds_wday <int>,
## # ds_wday.xts <int>, ds_mday <int>, ds_qday <int>, ds_yday <int>,
## # ds_mweek <int>, ds_week <int>, ds_week.iso <int>, ds_week2 <int>,
## # ds_week3 <int>, ds_week4 <int>, ds_mday7 <int>, ds_month.lbl_01 <dbl>,
## # ds_month.lbl_02 <dbl>, ds_month.lbl_03 <dbl>, ds_month.lbl_04 <dbl>, …
# arima_boost
model_fit_arima_boosted <- arima_boost(
min_n = 2,
learn_rate = 0.0000015
) %>%
set_engine(engine = "auto_arima_xgboost") %>%
fit(y ~ ds + as.numeric(ds) + factor(month(ds, label = TRUE), ordered = F),
data = training(splits))
## frequency = 12 observations per 1 year
# random forest
model_spec_rf <- rand_forest(trees = 1000, min_n = 50) %>%
set_engine("randomForest")
workflow_fit_rf <- workflow() %>%
add_model(model_spec_rf) %>%
add_recipe(recipe_spec %>% step_rm(ds)) %>%
fit(training(splits))
# mars
model_spec_mars <- mars(mode = "regression") %>%
set_engine("earth")
recipe_spec <- recipe(y ~ ds, data = training(splits)) %>%
step_date(ds, features = "month", ordinal = FALSE) %>%
step_mutate(ds_num = as.numeric(ds)) %>%
step_normalize(ds_num) %>%
step_rm(ds)
wflw_fit_mars <- workflow() %>%
add_recipe(recipe_spec) %>%
add_model(model_spec_mars) %>%
fit(training(splits))
# Model Spec
model_spec <- prophet_boost(
learn_rate = 0.1
) %>%
set_engine("prophet_xgboost")
# Fit Spec
if (TRUE) {
model_fit <- model_spec %>%
fit(log(y) ~ ds + as.numeric(ds) + month(ds, label = TRUE),
data = training(splits))
model_fit
}
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## parsnip model object
##
## PROPHET w/ XGBoost Errors
## ---
## Model 1: PROPHET
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'auto'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
##
## ---
## Model 2: XGBoost Errors
##
## xgboost::xgb.train(params = list(eta = 0.1, max_depth = 6, gamma = 0,
## colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1,
## subsample = 1, objective = "reg:squarederror"), data = x$data,
## nrounds = 15, watchlist = x$watchlist, verbose = 0, nthread = 1)
# Model Spec
model_spec <- seasonal_reg() %>%
set_engine("stlm_ets")
# Fit Spec
model_fit_ses <- model_spec %>%
fit(log(y) ~ ds, data = training(splits))
## frequency = 12 observations per 1 year
model_spec <- seasonal_reg() %>%
set_engine("stlm_arima")
# Fit Spec
model_fit_sta <- model_spec %>%
fit(log(y) ~ ds, data = training(splits))
## frequency = 12 observations per 1 year
#> frequency = 48 observations per 1 day
model_fit
## parsnip model object
##
## PROPHET w/ XGBoost Errors
## ---
## Model 1: PROPHET
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'auto'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
##
## ---
## Model 2: XGBoost Errors
##
## xgboost::xgb.train(params = list(eta = 0.1, max_depth = 6, gamma = 0,
## colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1,
## subsample = 1, objective = "reg:squarederror"), data = x$data,
## nrounds = 15, watchlist = x$watchlist, verbose = 0, nthread = 1)
model_spec <- prophet_reg() %>%
set_engine("prophet")
# Fit Spec
model_fit_p <- model_spec %>%
fit(log(y) ~ ds, data = training(splits))
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
models_tbl <- modeltime_table(
model_fit_arima_boosted,
wflw_fit_mars,
workflow_fit_rf,
model_fit,
model_fit_ses,
model_fit_sta,
model_fit_p)
models_tbl
## # Modeltime Table
## # A tibble: 7 × 3
## .model_id .model .model_desc
## <int> <list> <chr>
## 1 1 <fit[+]> ARIMA(1,0,0)(1,1,0)[12] W/ XGBOOST ERRORS
## 2 2 <workflow> EARTH
## 3 3 <workflow> RANDOMFOREST
## 4 4 <fit[+]> PROPHET W/ XGBOOST ERRORS
## 5 5 <fit[+]> SEASONAL DECOMP: ETS(A,N,N)
## 6 6 <fit[+]> SEASONAL DECOMP: ARIMA(0,1,0)
## 7 7 <fit[+]> PROPHET
calibration_table <- models_tbl %>%
modeltime_calibrate(testing(splits))
calibration_table %>%
modeltime_accuracy() %>%
table_modeltime_accuracy(.interactive = FALSE)
| Accuracy Table | ||||||||
|---|---|---|---|---|---|---|---|---|
| .model_id | .model_desc | .type | mae | mape | mase | smape | rmse | rsq |
| 1 | ARIMA(1,0,0)(1,1,0)[12] W/ XGBOOST ERRORS | Test | 38137.72 | 29.86 | 1.67 | 36.84 | 41839.34 | 0.70 |
| 2 | EARTH | Test | 27460.44 | 24.25 | 1.20 | 19.84 | 34975.66 | 0.52 |
| 3 | RANDOMFOREST | Test | 17449.68 | 15.24 | 0.76 | 13.50 | 22710.56 | 0.45 |
| 4 | PROPHET W/ XGBOOST ERRORS | Test | 0.10 | 0.82 | 0.51 | 0.82 | 0.11 | 0.61 |
| 5 | SEASONAL DECOMP: ETS(A,N,N) | Test | 0.11 | 0.91 | 0.57 | 0.91 | 0.11 | 0.65 |
| 6 | SEASONAL DECOMP: ARIMA(0,1,0) | Test | 0.11 | 0.91 | 0.57 | 0.91 | 0.11 | 0.65 |
| 7 | PROPHET | Test | 0.11 | 0.95 | 0.60 | 0.96 | 0.14 | 0.61 |
calibration_table %>%
modeltime_forecast(actual_data = r_mass) %>%
plot_modeltime_forecast(.interactive = TRUE)
## Using '.calibration_data' to forecast.
refit_tbl <- calibration_table %>%
modeltime_refit(data = r_mass)
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## frequency = 12 observations per 1 year
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
# forecast 36 months
#(Removing the first 3 models from the display shows a more detailed prediction with less error)
refit_tbl %>%
modeltime_forecast(h = "36 months", actual_data = r_mass) %>%
filter(.model_desc != 'ACTUAL') %>%
plot_modeltime_forecast(
.legend_max_width = 25, # For mobile screens
.interactive = TRUE
)
# author:Cai
# food predict
food$ds <- month
food$y <-food$weight_sum
r_food <-data.frame(food['ds'],food['y'])
# Data visualisation
r_food %>%
plot_time_series(ds,y)
# Split Data 80/20
splits <- initial_time_split(r_food, prop = 0.9)
recipe_spec <- recipe(y ~ ds, training(splits)) %>%
step_timeseries_signature(ds) %>%
step_fourier(ds, period = 91.25, K = 1) %>%
step_dummy(all_nominal())
recipe_spec %>% prep() %>% juice()
## # A tibble: 54 × 46
## ds y ds_index.num ds_year ds_year.iso ds_half ds_quarter
## <date> <dbl> <dbl> <int> <int> <int> <int>
## 1 2004-07-01 21754203. 1088640000 2004 2004 2 3
## 2 2004-08-01 21107094. 1091318400 2004 2004 2 3
## 3 2004-09-01 22477486. 1093996800 2004 2004 2 3
## 4 2004-10-01 26811262. 1096588800 2004 2004 2 4
## 5 2004-11-01 49708883. 1099267200 2004 2004 2 4
## 6 2004-12-01 37671981. 1101859200 2004 2004 2 4
## 7 2005-01-01 19218996. 1104537600 2005 2004 1 1
## 8 2005-02-01 17898041. 1107216000 2005 2005 1 1
## 9 2005-03-01 21150720. 1109635200 2005 2005 1 1
## 10 2005-04-01 19079235. 1112313600 2005 2005 1 2
## # … with 44 more rows, and 39 more variables: ds_month <int>,
## # ds_month.xts <int>, ds_day <int>, ds_hour <int>, ds_minute <int>,
## # ds_second <int>, ds_hour12 <int>, ds_am.pm <int>, ds_wday <int>,
## # ds_wday.xts <int>, ds_mday <int>, ds_qday <int>, ds_yday <int>,
## # ds_mweek <int>, ds_week <int>, ds_week.iso <int>, ds_week2 <int>,
## # ds_week3 <int>, ds_week4 <int>, ds_mday7 <int>, ds_sin91.25_K1 <dbl>,
## # ds_cos91.25_K1 <dbl>, ds_month.lbl_01 <dbl>, ds_month.lbl_02 <dbl>, …
# arima_boost
model_fit_arima_boosted <- arima_boost(
min_n = 2,
learn_rate = 0.000015
) %>%
set_engine(engine = "auto_arima_xgboost") %>%
fit(y ~ ds + as.numeric(ds) + factor(month(ds, label = TRUE), ordered = F),
data = training(splits))
## frequency = 12 observations per 1 year
# random forest
model_spec_rf <- rand_forest(trees = 1000, min_n = 50) %>%
set_engine("randomForest")
workflow_fit_rf <- workflow() %>%
add_model(model_spec_rf) %>%
add_recipe(recipe_spec %>% step_rm(ds)) %>%
fit(training(splits))
# mars
model_spec_mars <- mars(mode = "regression") %>%
set_engine("earth")
recipe_spec <- recipe(y ~ ds, data = training(splits)) %>%
step_date(ds, features = "month", ordinal = FALSE) %>%
step_mutate(ds_num = as.numeric(ds)) %>%
step_normalize(ds_num) %>%
step_rm(ds)
wflw_fit_mars <- workflow() %>%
add_recipe(recipe_spec) %>%
add_model(model_spec_mars) %>%
fit(training(splits))
# Model Spec
model_fit_pro_boost <- prophet_boost(
learn_rate = 0.1
) %>%
set_engine("prophet_xgboost")
# Fit Spec
if (TRUE) {
model_fit <- model_fit_pro_boost %>%
fit(log(y) ~ ds + as.numeric(ds) + month(ds, label = TRUE),
data = training(splits))
model_fit
}
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## parsnip model object
##
## PROPHET w/ XGBoost Errors
## ---
## Model 1: PROPHET
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'auto'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
##
## ---
## Model 2: XGBoost Errors
##
## xgboost::xgb.train(params = list(eta = 0.1, max_depth = 6, gamma = 0,
## colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1,
## subsample = 1, objective = "reg:squarederror"), data = x$data,
## nrounds = 15, watchlist = x$watchlist, verbose = 0, nthread = 1)
# Model Spec
model_spec <- seasonal_reg() %>%
set_engine("stlm_ets")
# Fit Spec
model_fit_ses <- model_spec %>%
fit(log(y) ~ ds, data = training(splits))
## frequency = 12 observations per 1 year
model_spec <- seasonal_reg() %>%
set_engine("stlm_arima")
# Fit Spec
model_fit_sta <- model_spec %>%
fit(log(y) ~ ds, data = training(splits))
## frequency = 12 observations per 1 year
#> frequency = 48 observations per 1 day
model_spec <- prophet_reg() %>%
set_engine("prophet")
# Fit Spec
model_fit_p <- model_spec %>%
fit(log(y) ~ ds, data = training(splits))
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
models_tbl <- modeltime_table(
model_fit_arima_boosted,
wflw_fit_mars,
workflow_fit_rf,
model_fit,
model_fit_ses,
model_fit_sta,
model_fit_p
)
models_tbl
## # Modeltime Table
## # A tibble: 7 × 3
## .model_id .model .model_desc
## <int> <list> <chr>
## 1 1 <fit[+]> ARIMA(2,0,0)(0,1,0)[12] W/ XGBOOST ERRORS
## 2 2 <workflow> EARTH
## 3 3 <workflow> RANDOMFOREST
## 4 4 <fit[+]> PROPHET W/ XGBOOST ERRORS
## 5 5 <fit[+]> SEASONAL DECOMP: ETS(A,N,N)
## 6 6 <fit[+]> SEASONAL DECOMP: ARIMA(0,1,3)
## 7 7 <fit[+]> PROPHET
calibration_table <- models_tbl %>%
modeltime_calibrate(testing(splits))
calibration_table %>%
modeltime_accuracy() %>%
table_modeltime_accuracy(.interactive = TRUE)
calibration_table %>%
modeltime_forecast(actual_data = r_food) %>%
plot_modeltime_forecast(.interactive = TRUE)
## Using '.calibration_data' to forecast.
refit_tbl <- calibration_table %>%
modeltime_refit(data = r_food)
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## frequency = 12 observations per 1 year
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#(Removing the first 3 models from the display shows a more detailed prediction with less error)
refit_tbl %>%
modeltime_forecast(h = "36 months", actual_data = r_food) %>%
filter(.model_desc != 'ACTUAL') %>%
plot_modeltime_forecast(
.legend_max_width = 25, # For mobile screens
.interactive = TRUE
)
# author:Cai
# drug predict
drug$ds <- month
drug$y <-drug$weight_sum
r_drug <-data.frame(drug['ds'],drug['y'])
# Data visualisation
r_drug %>%
plot_time_series(ds,y)
# Split Data 80/20
splits <- initial_time_split(r_drug, prop = 0.9)
recipe_spec <- recipe(y ~ ds, training(splits)) %>%
step_timeseries_signature(ds) %>%
step_fourier(ds, period = 365, K = 8) %>%
step_dummy(all_nominal())
recipe_spec %>% prep() %>% juice()
## # A tibble: 54 × 60
## ds y ds_index.num ds_year ds_year.iso ds_half ds_quarter
## <date> <dbl> <dbl> <int> <int> <int> <int>
## 1 2004-07-01 313263. 1088640000 2004 2004 2 3
## 2 2004-08-01 310946. 1091318400 2004 2004 2 3
## 3 2004-09-01 314866. 1093996800 2004 2004 2 3
## 4 2004-10-01 351334. 1096588800 2004 2004 2 4
## 5 2004-11-01 460019. 1099267200 2004 2004 2 4
## 6 2004-12-01 407903. 1101859200 2004 2004 2 4
## 7 2005-01-01 338459. 1104537600 2005 2004 1 1
## 8 2005-02-01 345270. 1107216000 2005 2005 1 1
## 9 2005-03-01 342112. 1109635200 2005 2005 1 1
## 10 2005-04-01 320271. 1112313600 2005 2005 1 2
## # … with 44 more rows, and 53 more variables: ds_month <int>,
## # ds_month.xts <int>, ds_day <int>, ds_hour <int>, ds_minute <int>,
## # ds_second <int>, ds_hour12 <int>, ds_am.pm <int>, ds_wday <int>,
## # ds_wday.xts <int>, ds_mday <int>, ds_qday <int>, ds_yday <int>,
## # ds_mweek <int>, ds_week <int>, ds_week.iso <int>, ds_week2 <int>,
## # ds_week3 <int>, ds_week4 <int>, ds_mday7 <int>, ds_sin365_K1 <dbl>,
## # ds_cos365_K1 <dbl>, ds_sin365_K2 <dbl>, ds_cos365_K2 <dbl>, …
# arima_boost
model_fit_arima_boosted <- arima_boost(
min_n = 2,
learn_rate = 0.00015
) %>%
set_engine(engine = "auto_arima_xgboost") %>%
fit(y ~ ds + as.numeric(ds) + factor(month(ds, label = TRUE), ordered = F),
data = training(splits))
## frequency = 12 observations per 1 year
# random forest
model_spec_rf <- rand_forest(trees = 1000, min_n = 50) %>%
set_engine("randomForest")
workflow_fit_rf <- workflow() %>%
add_model(model_spec_rf) %>%
add_recipe(recipe_spec %>% step_rm(ds)) %>%
fit(training(splits))
# mars
model_spec_mars <- mars(mode = "regression") %>%
set_engine("earth")
recipe_spec <- recipe(y ~ ds, data = training(splits)) %>%
step_date(ds, features = "month", ordinal = FALSE) %>%
step_mutate(ds_num = as.numeric(ds)) %>%
step_normalize(ds_num) %>%
step_rm(ds)
wflw_fit_mars <- workflow() %>%
add_recipe(recipe_spec) %>%
add_model(model_spec_mars) %>%
fit(training(splits))
# Model Spec
model_fit_pro_boost <- prophet_boost(
learn_rate = 0.1
) %>%
set_engine("prophet_xgboost")
# Fit Spec
if (TRUE) {
model_fit <- model_fit_pro_boost %>%
fit(log(y) ~ ds + as.numeric(ds) + month(ds, label = TRUE),
data = training(splits))
model_fit
}
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## parsnip model object
##
## PROPHET w/ XGBoost Errors
## ---
## Model 1: PROPHET
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'auto'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
##
## ---
## Model 2: XGBoost Errors
##
## xgboost::xgb.train(params = list(eta = 0.1, max_depth = 6, gamma = 0,
## colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1,
## subsample = 1, objective = "reg:squarederror"), data = x$data,
## nrounds = 15, watchlist = x$watchlist, verbose = 0, nthread = 1)
# Fit Spec
model_fit_ses <- model_spec %>%
fit(log(y) ~ ds, data = training(splits))
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
model_spec <- seasonal_reg() %>%
set_engine("stlm_arima")
# Fit Spec
model_fit_sta <- model_spec %>%
fit(log(y) ~ ds, data = training(splits))
## frequency = 12 observations per 1 year
#> frequency = 48 observations per 1 day
model_spec <- prophet_reg() %>%
set_engine("prophet")
# Fit Spec
model_fit_p <- model_spec %>%
fit(log(y) ~ ds, data = training(splits))
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
models_tbl <- modeltime_table(
model_fit_arima_boosted,
wflw_fit_mars,
workflow_fit_rf,
model_fit,
model_fit_ses,
model_fit_sta,
model_fit_p
)
models_tbl
## # Modeltime Table
## # A tibble: 7 × 3
## .model_id .model .model_desc
## <int> <list> <chr>
## 1 1 <fit[+]> ARIMA(1,1,0)(0,1,0)[12] W/ XGBOOST ERRORS
## 2 2 <workflow> EARTH
## 3 3 <workflow> RANDOMFOREST
## 4 4 <fit[+]> PROPHET W/ XGBOOST ERRORS
## 5 5 <fit[+]> PROPHET
## 6 6 <fit[+]> SEASONAL DECOMP: ARIMA(0,1,0)
## 7 7 <fit[+]> PROPHET
calibration_table <- models_tbl %>%
modeltime_calibrate(testing(splits))
calibration_table %>%
modeltime_accuracy() %>%
table_modeltime_accuracy(.interactive = TRUE)
calibration_table %>%
modeltime_forecast(actual_data = r_drug) %>%
plot_modeltime_forecast(.interactive = TRUE)
## Using '.calibration_data' to forecast.
refit_tbl <- calibration_table %>%
modeltime_refit(data = r_drug)
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#(Removing the first 3 models from the display shows a more detailed prediction with less error)
refit_tbl %>%
modeltime_forecast(h = "36 months", actual_data = r_drug) %>%
filter(.model_desc != 'ACTUAL') %>%
plot_modeltime_forecast(
.legend_max_width = 25, # For mobile screens
.interactive = TRUE
)
On the traditional model:
Residual test, significant: residuals are not smooth p-value greater than 0.05 ,so the data are not suitable to use arima model.
In particular, we used time regression to predict trends in the annual data. It is clear that the three levels of supermarket sales show an upward trend.
The Holt-Winters seasonality approach consists of a forecasting equation and three smoothing equations and it is clear that the model has identified monthly and quarterly seasonal patterns and growth trends at the end of the data and that the forecasts match the test data.However, HW did not have the same fit results for the two seasonal models on some data.
For each month’s forecast, we used the STL and ETS models to forecast the seasonal follow, trends. It is clear to see that there is an upward trend for the next three years of the cycle.
On the integrated model:auto_arima_xgboost,randomForest,earth,prophet_xgboost,stlm_ets,stlm_arima,prophet.(modeltime combines time series data well with machine learning models. )
prophet has the advantage of being able to calculate the variation points of the first 80 percent of the historical data, from which future cycles can be predicted, and also has the benefit of calculating trends,The algorithm will automatically calculate the change points. And XGBoost has the good effect of training residuals. However, the rmse of arima is particularly large because this data is not applicable to the arima model, but the residuals of xgboost training prophet converge with good results.
RANDOMFOREST has the advantage of dealing with non-linear regression problems, but here it seems that rmse does not converge.
EARTH is a segmented regression.Again, the results do not apply here.
The good performers are prophet_xgboost,stlm_ets,stlm_arima,prophet. stlm_ets,stlm_arima are seasonal models and the difference between prophet_xgboost and prophet is that prophet_xgboost is trained with xgboost to train the residuals. Because the model is logistic and easy to calculate, the predicted values are small, but the trend and season can be predicted more accurately. If you remove the first three model lines, you can see the details of the other four models.
The XGBoost component has specified parameters. We can get better accuracy by tuning, but as the prophet component works well on this data, the additional improvement is likely to be low.
As there are only months and years in modeltime, there are no forecast quarters on the novel model.
The models all capture the uptrend. However, the novel model is more accurate and detailed than the time regression model.
Different models should be used to fulfil different forecasting needs on different data intervals (or different amounts of data, less so for quarters versus years).
For example, we use time regression and machine learning algorithms to forecast yearly trends, but the machine learning algorithms are different from the traditional algorithm framework, for example, in the modeltime package, the imported data scale is by month, so for 04 and 09 missing half-year data, it is more flexible for forecasting yearly trends. For example, it will help us calculate the trend to 06 December, instead of predicting the trend from 2010 to December 2012, the data processing interval is more flexible.
Exponential smoothing and hw have excellent performance in seasonal forecasting. But by combining traditional models with machine learning algorithms, the results will be even better. Among the novel models for dealing with time series problems, there are not only machine learning models, but more commonly deep learning models, most of which are related to deep learning during the literature search. However, due to the limited preparation time given for the exam, it takes more than a month to train deep learning, so instead of choosing a deep learning model for this novel model, we chose, instead, the faster training machine We did not choose a deep learning model this time, but a faster training machine learning model. Although there were some limitations, we were able to complete all the tasks.